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I.  Introduction 


Programmable  logic  arrays  (PLAs)  provide  an  efficient  and  flexible  way  to  implement  general 
modules  for  combinational  systems  in  a  regular  manner.  Similarly  programmable  sequential 
arrays  can  be  formed  by  including  storage  cells  together  with  the  logic.  These  arrays  can  be 
programmed  to  implement  general  modules  of  sequential  systems.  While  implementing 
Boolean  functions  with  PLAs,  a  general  logic  function  is  first  represented  in  a 
sum-of-products  terms.  Then,  a  two-stage  NOR  network  (with  NOT  function  added  to  both 
input  and  output)  is  used  to  map  the  logic  equation  to  gates. 


While  nMOS  realization  of  NOR  gates  is  quite  good  and  straightforward,  static  CMOS/bulk 
NOR  gates  present  many  problems  [1],  First,  while  the  better  devices  (n-channels)  are  in 
parallel,  the  worse  devices  (p-channels)  are  in  series,  which  makes  the  gate  slow.  In  fact,  an 
NAND/NAND  structure  is  generally  used,  which  puts  the  better  devices  (n-channels)  in 
series.  Second,  since  both  n-channel  devices  and  p-channel  devices  are  required,  well 
location  must  be  carefully  arranged  and  the  total  area  needed  is  large.  Domino  CMOS  [2] 
nethod  is  used  to  implement  small  precharge  PLAs  in  CMOS/bulk.  The  standard  approach 
is  to  use  a  precharge  NAND  structure  to  implement  the  AND  plane  of  the  PLA  and  to  use  a 
precharge  NOR  to  implement  the  OR  plane.  In  between  the  planes  and  at  output  NOT  is 
used.  This  approach  is  nice  because  there  is  no  charge  sharing  possibility  in  precharge  NOR 
gates.  However  the  main  disadvantage  of  thi?  v  tino  PLA  is  that  with  large  input  terms,  the 
series  AND  in  the  NAND  gates  is  still  slow.  In  .  \  the  delay  is  quadratic  in  the  number  of 

literals  in  series.  As  a  result,  it  is  desirable  to  have  a  precharge  NOR/NOR  typed  of  PLA 
structures  in  CMOS.  Unfortunately,  the  precharge  NOR  gate  cannot  be  concatenated  directly 
to  form  a  NOR/NOR  PLA  structure.  This  is  a  result  of  having  the  output  of  a  precharge  NOR 
gate  goes  from  one  to  NOR.  If  this  output  is  directly  connected  to  the  input  of  another 
precharge  NOR  gate  undesired  discharging  will  occur. 


For 

Thus,  one  must  include  some  timing  strategy  to  prevent  unwanted  discharging.  In  this  paper,  i 
delayed  clock  is  used  to  precharge  the  OR  plane,  which  allows  the  concatenation  of  NOR 
gates.  We  will  discuss  the  circuit  and  its  timing  strategy  in  the  following  sections.  This  PLA  •  on- 
structure  has  been  adapted  into  Berkeley  PLA  tools  [3].  Optimization  and  auto-generation 

of  general  Finite  State  Machine  (FSM)  is  available  to  the  public.  m/ 
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II.  Circuit  Description 


Several  dynamic  CMOS  NOR/NOR  PLA  structures  have  been  suggested  [1]  [ 4 J  [5)  [6]  [9]. 
The  proposed  approach  is  similar  to  [4].  A  schematic  diagram  of  the  circuitry  is  shown  in 
Figure  1.  The  AND  plane  consists  of  the  normal  precharge  NOR  gates  arranges  as  one  row 
per  implicants.  It  is  precharged  during  the  low  clock  signal  and  it  is  evaluated  when  the  clock 
signal  is  high.  On  the  top  of  the  AND  plane  is  a  dummy  row.  For  each  of  the  input  literals,  a 
diffusion  area  equals  to  the  drain  area  of  a  pull-down  transistor  is  added  to  the  dummy  row. 
As  a  result  the  total  parasitic  loading  on  the  dummy  row  is  greater  than  any  of  the  implicant 
row.  Since  it  is  also  precharged  by  the  same  clock  signal  as  the  rest  of  the  implicants,  it 
discharges  at  the  worst-case  rate  in  comparison  with  the  rest  of  the  rows.  This  slowest 
’’dummy-implicant”  is  inverted  to  produce  the  delayed  clock.  This  delayed  clock  is  then 
used  to  precharge  and  evaluate  the  OR  plane  NOR  gates.  By  the  time  this  delayed  clock  rises 
to  evaluate  the  OR  plane  logic,  all  implicants  has  settled  to  its  desired  output  of  either  high  or 
low  state.  Hence,  no  undesirable  discharging  by  the  OR  plane  NOR  gates  can  occur.  Outputs 
from  the  OR  plane  NOR  gates  are  latched  by  static  flip-flops.  These  flip-flops  are  gated  by 
the  inverse  of  the  input  clock  and  the  delayed  clock.  In  effect,  the  latched  output  will  change 
its  state  only  after  the  trailing  edge  of  the  input  clock  signal.  This  will  warrant  no  undesirable 
discharging  when  concatenating  blocks  of  these  PSA.  Moreover,  this  PSA  structure  is  static 
from  a  system  point  of  view. 

With  the  absence  of  clock  signal  switching,  the  state  of  the  output  latches  will  stay  unchanged 
holding  the  previous  values.  A  detail  circuit  diagram  implementing  the  logic  function 
F(A,B,C,D,E)  =  ABC+DE  is  given  in  Figure  2.  The  logic  is  as  follows:  Since  II  = 
(A’+B’+C’)’  =  ABC;  12  =  (D’+E’)’  =  DE;  f  =  (11+12)’  and  F=f,  therefore  we  have  F  = 
ABC+DE.  There  are  cut-off  transistors  for  both  the  AND  and  OR  planes  of  the  PSA  to 
make  sure  that  there  is  no  direct  path  from  Vdd  to  ground  during  the  precharging  phase. 
While  the  AND-plane  NOR  gates  have  only  one  "cut-off”  transistor  per  gate  to  disconnect 
the  inputs  when  precharging,  the  OR  plane  NOR  gates  need  two  transistors  in  series  to  cut  off 
the  input  during  precharge  period.  The  detail  timing  strategy  will  be  discussed  in  the  following 
section. 
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III.  Timing  and  Electrical  Design  Consideration 


Only  a  single  input  clock  is  required.  This  reduces  the  routing  area  used  for  clock  signals 
between  blocks.  Moreover,  a  single-phase  clocked  functional  block,  such  as  the  PSA 
suggested,  simplifies  the  overall  system  timing  strategy  [7].  Additional  clocks  needed  are 
generated  by  the  PSA.  A  timing  diagram  is  given  in  Figure  3.  There  are  total  of  four  clock 
signals  containing  eight  clock  edges  used  to  control  this  PSA  structure.  They  are  named  A 
through  H  as  depicted  in  Figure  3. 

Region  1  is  bounded  by  edge  C  and  edge  H  from  the  previous  cycle.  Region  2  is  between 
edges  C  and  D.  Region  3  is  between  edges  D  and  E  and  region  4  is  defined  by  edges  F  and 
H. 

During  region  1,  AND-plane  and  the  OR-plane  of  the  PSA  is  being  precharged.  During 
region  2,  AND-plane  is  evaluating.  During  the  period  of  region  3,  OR-plane  is  evaluating. 
Finally,  the  output  is  latched  during  region  4.  A  timing  gap  exists  between  region  3  and  4  to 
ensure  that  clock  overlapping  or  clock  skews  will  not  create  undesirable  discharing  of  the 
dynamic  NOR  gates.  Input  should  be  valid  before  edge  A  and  output  will  be  valid  shortly 
after  edge  H.  Output  will  remain  unchanged  until  shortly  after  the  next  edge  H.  As  a  result, 
outputs  of  this  PSA  structure  can  be  used  as  input  signals  for  the  same  PSA  or  other  PSAs 
directly.  The  total  delay  contributed  by  the  worse-case  "dummy-row”  and  worse-case  OR 
plane  should  not  exceed  the  pulse  width.  Clock  period  must  be  longer  than  the  sum  of  output 
latch  settling  time,  the  total  delay  from  the  PL  A,  and  A,  where  A  is  the  delay  contributed  by 
inverting  the  input  clock.  This  single-phase  clock  timing  strategy  does  not  have  two-sided 
relation  to  satisfy  [8]. 

To  avoid  noise  problems,  the  layout  should  not  connect  Vdd  or  ground  through  diffusion 
layer.  Although  using  diffusion  may  result  in  a  more  compact  circuit,  the  noise  problems  as 
well  as  the  speed  slow  down  due  to  resistance  do  not  payoff.  We  connect  all  gated-ground 
and  gated-Vdd  with  metal  layer  only.  Conforming  to  the  MOSIS  scalable  CMOS  rules  (rev. 
6),  we  obtain  a  8x12  lambda  pitch  for  the  AND-plane  and  a  12x16  lambda  pitch  for  the 


IV.  Example 


A  4-bit  counter  is  implemented.  First,  a  finite  state  machine  is  described  with  PEG  [3] 
specification.  It  is  then  automatically  translated  to  logic  equation  format  using  the.  software 
PEG  [3].  Logic  equations  are  converted  to  truth  table  via  EQNTOTT  [3].  ESPRESSO  [3]  is 
used  to  simplify  the  truth  table.  Finally,  MPLA  [3]  is  used  to  generate  layout  in  Magic 
format.  The  resulting  PSA  is  measured  at  194  p.m  by  343  p.m  in  size.  This  counter  is 
fabricated  with  a  MOSIS  2  p.m  TinyChip.  It  is  functional  at  a  clock  frequency  of  50MHz. 
The  layout  of  a  fabricated  chip  is  provided  in  Figure  4. 


V.  Conclusion 

Programmable  Sequential  Arrays  are  useful  parts  of  many  digital  designs.  They  can  be  used 
as  building  blocks  of  a  general  finite-state  machines.  They  can  be  used  as  controllers  for  a 
processor.  The  CMOS  PSA  structure  described  provides  a  simple  and  flexible  single-phase 
timing  strategy.  It  uses  a  precharge  CMOS  NOR/NOR  structure.  No  practical  limit  on  the 
input  variables  is  imposed.  Several  blocks  of  this  PSAs  can  be  concatenated  to  form  a  more 
complicated  sequential  machine. With  the  existing  available  software  tools,  fast  and  dense 
sequential  blocks  can  be  designed  quickly.  An  example  is  given  to  illustrate  the  proposed 
structure. 
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(a)  Intermediate  clocks 


Figure  2.  Detailed  digram  of  the  PSA  circuit  implementing  F=ABC+DE 
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Figure  3.  Timing  diagram  with  operating  regions 
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Figure  4.  Layout  of  a  4-bit  Counter 


